CS ЕЕ World 
https://cseeworld.wixsite.com/home 
May 2021 

28/34 

A 

Submitter Info: 

Anonymous 


Extended Essay in Computer Science 


Examination Session — May 2021 


Title — Investigating the Algorithmic Efficiency of Binary Search Tree and 


Binary Heap Based Sorting Algorithms 


Research Question - How does the sorting efficiency of the Tree Sort compare 
to that of the Heap Sort in terms of time complexity for increasing sizes of 


randomized integer datasets? 


Word Count - 3997 


10. 


11. 


Table of Contents 


датта Meum 2 
Theory ына К СС СОС 2 
2.1. Sorting A TOOLS. eei eer реки не ЫКЫ ЫЕ ЕАН ЕНЕЛ СРЕ ЕТНА ЕАН 2 
2.2. Tree Sort & Binary Search Trees гг ы таны көке өк өсе 5 
2.3. Heap SOM & Binary MEADS сіре ро ав ЕКЕН 10 
н a E ENEE EEE О О КЛ СГ 14 
Methodology gm —————————— 15 
4.1. Independent апарбек ое онаа охааа анааан ей 15 
22. Dipendent Variable есен ЕТА тон 16 
4:2. Controlled VArablBs еее ны нини лаана Кеа барат е тына ы қарны 16 
AA Procedure НИ ЕК О КГ О Г КТА КОЛО ОГ 17 
Data Processing and Graphing .............................................................!)19 7 212 2 1). 17 
3:1. -Raw Data СОВЕ. TRI T QI TT 17 
5.2. Серии and Curve Fitting ниргә on rb EK S HN ON опора винні нові івана доза алы 20 
Tur УВАЊ амнын О СОЛГО бек йін 23 
Results Discussion & Evaluation ..................................................2...1.1.).  ə дақ өт. 26 
Conclusion M 28 
Fürther SCOPE c елимнин өзіныш-ыналенітылмшы UN MN UN MEM M FIERE 29 
Bibliogkanliy ТТЛ ЛГ ЛГ UM HR M AE ККК ГЛ ГГ 30 
Аррепйасеб анна нн КАЛК —————————— дА 33 
І А (РАВ У с 1016 Р A ае ТТТ Те” 23 
1 (ДРАМ о 7 с 17-54 25,2 емегиесиккикнкбеккекандын Р РН ЕІ ЛИ E НА У 34 
11.3; Appendik ӘБЕКЕ ИЛ НЕКЕ ТИ ————Á 35 
IB MS cii qb MTM ——————— 36 


1. Introduction 


The primary focus of this essay is to investigate the computational complexities or the sorting 
efficiencies of binary-tree based sorting algorithms, a class of algorithms based on binary 
abstract data structures. Today, sorting is one of the most popular and useful computational 
processes, and hence, performing a comparative study between a specific set of these 
algorithms is crucial. Thus, this essay will look specifically into two sorting algorithms: Tree 
Sort, which is based on Binary Search Trees (BST) and the Heap Sort, which is based on Binary 
Heaps. These algorithms will be compared in terms of their time complexity: the time taken 
for algorithm execution based on the input dataset size. Hence, this gives rise to the research 
question: “How does the sorting efficiency of the Tree Sort compare to that of the Heap 


Sort in terms of time complexity for increasing sizes of randomized integer datasets?” 


2. Theory 


2.1 Sorting Algorithms 


Sorting algorithms are one of the simplest but most unique classes of algorithms. A sorting 
algorithm performs a series of operations on a set of integers and outputs them, in sorted or 


ascending order. For example — 
[5,3,2,4,1] > [1,2,3,4, 5] 


As shown above, the concept of sorting is straightforward. However, the approaches taken to 
sorting can be very diverse. Hence, sorting algorithms can further be classified into 
Comparison Sorts and Integer Sorts.! Comparison sorts are based on comparing two 


elements to determine if one should be before or after the other in the sorted list. A few 


1 "Difference between Comparison (QuickSort) and Non-Comparison (Counting Sort) Based Sorting Algorithms?,” 
Javarevisited, accessed July 12, 2020, https://javarevisited.blogspot.com/2017/02/difference-between-comparison- 
quicksort-and-non-comparison-counting-sort-algorithms.html#axzz6nplsEjux. 


examples аге the Heap Sort and Merge Sort. On the contrary, Integer Sorts determine the 
number of elements which are lesser in value than a selected element, based on its integer key, 
to identify the correct position of this element in the list without requiring extensive 
comparisons.) A few examples are the Radix Sort and Bucket Sort? An example of a 


comparison sorting algorithm is shown below — 


Comparison between 
integers 


Figure 1 — Visualization of Merge Sort? 


Figure 1 is a depiction of the merge sort which portrays single comparisons between pairs of 
integers as a means to sort an array. This brings into picture sorting algorithm design paradigms 
such as divide & conquer and recursion,” and also introduces time complexity as a means for 


algorithmic analysis. 


2 ibid. 

3 ibid. 

^ Nikhil Joshi, "Implementation and Analysis of Merge Sort," Dotnetlovers (Dotnetlovers, October 29, 2018), accessed July 
12, 2020, https://www.dotnetlovers.com/article/128/implementation-and-analysis-of-merge-sort. 

5 TimTim 1, "Divide and Conquer and Recursion," Stack Overflow, January 1, 2009, accessed July 12, 2020, 
https://www.stackoverflow.com/questions/2249767/divide-and-conquer-and-recursion. 


The primary method of measuring the efficiency of a sorting algorithms it to measure its time 
complexity. However, asymptotic time complexity — algorithm execution time as dataset size 
approaches infinity — can be used for a better understanding of algorithm efficiency. It can be 
divided into three parameters - O(n) the upper bound or worst-case complexity, Q(n) the 
lower bound or best-case complexity, and Ө(п) the average-case complexity.Ó These functions 


tell us the limits of, and the average running time of any algorithm as depicted below — 
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Figure 2 - Asymptotic Time Complexity Parameters’ 
Therefore, before experimentally determining the running-time of Tree Sort and Heap Sort, 
which are both comparison sorts, we can mathematically derive the best-case complexity to 


preordain a trend in running-time. 


6 “Asymptotic Analysis: Big-O Notation and More,” Programiz, accessed July 12, 2020, https://www.programiz.com/dsa/ 
asymptotic-notations. 

7 “Big-O Notation (Article) | Algorithms,” Khan Academy, Khan Academy, accessed July 12, 2020, 
https://www.khanacademy.org/computing/computer-science/algorithms/asymptotic-notation/a/big-o-notation. 


Taking а decision tree, where the leaves are all possible permutations (т!) of a set of integers 
and the comparison sort is modelled by the root-to-leaf path where each step is a comparison, 
then the number of comparisons is limited by the height of the tree.? As 2” is the number of 


leaves of the decision tree as a function of the height — 


Equation 1 - Relationship 
21 >т! => h > lo g(n!) Between Number of 
Comparisons and Height 


) РЕ : . of a Binary Tree? 
Using Stirling's Approximation — 


эп! > Су 
shalop E ТР (=) Equation 2 — Deriving the 


Lower-Bound Time 


Complexity of Comparison 


= n:log(n) – п :·109(е) 10 


Sorts 
= 0(п : log(n)) 
Consequently, we know that the running time of both Tree Sort and Heap Sort will not be 


better that п - log (n). 


2.2 Tree Sort & Binary Search Trees 


A binary tree is an abstract data structure composed of nodes. Each node has some data 
(integers in this case), and has pointers to a left and right child node. The topmost node is called 
the root, and a node with no child nodes is called a leaf. A Binary Search Tree (BST) is a 
special binary tree with certain properties. The value of any left child must always be less than 
the value of its parent node, and the value of any right child must always be greater than the 


value of its parent node.!! An example is shown below — 


8 Karleigh Moore, "Sorting Algorithms,” Brilliant Math & Science Wiki, accessed July 24, 2020, https://www.brilliant.org/ 
wiki/sorting-algorithms/. 

? ibid. 

10 ibid. 

11 "Data Structure - Binary Search Tree,” Tutorialspoint, accessed July 24, 2020, https://www.tutorialspoint.com/data_ 
structures algorithms/binary search tree.htm. 


Figure 3 - Binary Search Tree Example 


For sorting an integer dataset with tree sort, the integers must first be inserted into the BST 


through the following procedure — 


1. If the root node is пий і.е., the BST is empty, then the root node is set to this value. 

2. If the root node is present, the value being inserted is less than the root, and the left child 
node is null, then the left child will be set to this value. 

3. If the root node is present, the value being inserted is greater than the root, and the right 
child node is null, then the right child will be set to this value. 

4. If the child nodes already exist, this logic will occur recursively until a null child is found.'” 


This value will then be assigned to a new leaf node. A sample insertion is shown below. 


© 


Less than root node 


Figure 4 — Binary Search Tree Insertion (Comparison with Root Node) 


12 Robert Sedgewick, and Kevin Wayne, “Binary Search Trees,” Princeton University, The Trustees of Princeton University, 
accessed July 24, 2020, https://algs4.cs.princeton.edu/32bst/, 


Greater than internal С) 


поде 


Right child is null 


Figure 5 — Binary Search Tree Insertion (Comparison with Internal Node) 


Figure 6 — Binary Search Tree Final Insertion 


It is to be noted that BSTs are not naturally self-balancing. There is no restriction on tree height. 


After insertion, the second half of tree sort entails performing a traversal on the BST. A Depth- 
First Traversal algorithm, which traverses a BST branch-wise rather than level-wise (Breadth- 
First Traversal) would be more appropriate in this case since we need to access the leaves of 
the BST (lowest and highest values) in lower time. Furthermore, an Inorder traversal, which 
first traverses the left sub-tree, visits the root, and then traverse the right sub-tree, would allow 


the BST values to be returned in sorted order.!* This is shown below — 


13 Javinpaul, “How to Implement Inorder Traversal іп a Binary Search Tree?,” DEV Community (DEV Community, August 14, 
2019), accessed July 24, 2020, https://www.dev.to/javinpaul/how-to-implement-inorder-traversal-in-a-binary-search-tree- 
1787. 
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Figure 7 - Sorted Binary Search Tree 
Hence the algorithm is divided into two methods, the insert() method and dfs() method. Each 
node is represented by an object with three instance variables. One being the integer value of 
the node, and the other two being pointers to the left and right child nodes. The insert() method 
has two parameters: the root node object, and the integer value to be inserted into the Binary 
Search Tree. 

. public Node insert (Node node, int key) { 
if (node == null) { 


node = new Node(key); // Creating a new tree 
return node; 


} 
if (key < node.key) 
node.left = insert (node.left, key); 


else if (key > node.key) 
node.right = insert (node.right, key); 


return node; 


Figure 8 — Tree Sort Insert Function (Appendix a 


If a root node is not present, a new BST is created. However, if the value to be inserted is less 


than the value of the root node, then the insert() method is recursively called on the left sub- 


14 Vibin M, “Tree Sort," GeeksforGeeks, April 20, 2020, accessed August 1, 2020, https://www.geeksforgeeks.org/tree-sort/. 


tree until a base case is reached where the left or right child nodes are empty, after which а new 
node is inserted as a leaf. If the value to be inserted is greater than the value of the root node, 
then the insert() method is recursively called on the right sub-tree instead and the process is 
repeated. 
. public void dfs(Node node) | 
if (node != null) { 


dfs (node.left); // Recursing down the left sub-tree 
System.out.print(node.key + ", "); 


dfs(node.right); // Recursing down the right sub-tree 


Figure 9 — Inorder Traversal Function (Appendix A) 19 


In е dfs() method, the method recurses down the left sub-tree until the base case, а left child 
leaf is reached, in which case its value is printed, followed by the value of the parent node, 
followed by the value of the right child leaf. After the left sub-tree is recursively traversed, the 
root node is printed, and finally, the method recurses down the right-sub-tree. This would 


output the BST in sorted order. 


The average time complexity of the Tree Sort @(nlog(m)) can be broken down. The time 
complexity of both insert() and dfs() is O(nlog(n)). For these functions, n integers must be 
inputted into and outputted from the trees respectively and the time taken to recurse down the 
tree to insert and traverse each node are both O(log(m)) since the number of levels in a BST 
increases logarithmically with respect to the number of nodes. Therefore by adding the 


complexities, the constant can be ignored and the overall complexity comes to O(nlog(n)). 


15 ibid. 


2.3 Heap Sort and Binary Heaps 


A Binary Heap, specifically a Max Heap, is a binary tree with properties different to those of a 
BST. The value of each node must be greater than or equal to the values of the child nodes. 
Hence unlike a BST, a values in a max heap increase from bottom to top instead of from left 
to right. This property of а Max Heap also allows it to be naturally self-balancing. An example 


is shown below — 


Figure 10 — Max Binary Heap Example 


This naturally self-balancing property allows Max Heaps to be represented through arrays, 
where if a node is an index i, then the left child is at index 21 + 1, and the right child is at index 
2i + 2.1" For Heap Sort, a Max Heap must first be built by rearranging the array using a reverse 


breadth first traversal — 


1. Beginning from the last node in the п - 1 level of the tree (п is the number of levels), if 
the node is greater than both child nodes, the sub-tree is already heapified. 

2. However, if the node is less than either or both child nodes, it is swapped with the greater 
child node. Similarly all sub-trees on the n — 1 level must be heapified. 

16 Navjot Singh, “Why Is Binary Heap Never Unbalanced?,” Computer Science Stack Exchange, May 2, 2019, accessed 

August 18, 2020, https://cs.stackexchange.com/questions/108852/why-is-binary-heap-never-unbalanced. 


17 “Binary Heaps,” Heaps, Andrew CMU, accessed August 18, 2020, http://www.andrew.cmu.edu/course/15-121/ 
lectures/Binary%20Heaps/heaps. html. 
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3. Move to level п - 2 and repeat the process from right to left. If nodes аге swapped, the 
affected sub-trees must be recursively re-heapified. 
4. Once the traversal reaches the root node, the binary tree has been heapified into a Max 


Heap as depicted below — 


n — 1 sub-level Swapped - Less 
вста камінні відів than right child 


Figure 11 — Max Heap Heapification (Comparison and Swap Within Right Sub-Tree) 


Swapped - Less 
than right child 


Figure 12 - Max Heap Heapification (Comparison and Swap Within Left Sub-Tree) 


Swapped - Less 
than right child 


Heapified 


Figure 13 — Max Heap Heapification (Comparison with Root Node and Re-Heapification) 
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Figure 14 - Complete Max Heap after Heapification 


After the tree has been heapified, the root node is swapped with the last leaf node and added 
to the end of the array. The reduced heap is then re-heapified. This process of swapping the 


root with the last leaf and re-heapifying is repeated until the array is sorted — 


Swapped 


Swap to 
Re — Heapify 


Removed 
from Heap 


0 1 2 3 4 5 6 
ENE ТоГяТз 


Figure 16 — Sorted Max Binary Heap 
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Hence, the primary method in Heap Sort is heapify() which restores the Max Heap structure of 
the Binary Heap. This method is iteratively called by two loops in the sort() method, one that 
builds the Max Heap, and one that repeatedly extracts the root node and sorts the array. 
Heapify() has three parameters: the array to be sorted, the array length, and the array index of 
the root of a sub-tree to be heapified. 

void heapify(int arri]; int m; int 1) | 


int largest = i; // Initializing largest as root 
Наце ЧЕ = оғаш le Py теше бес = Besa ap il 


+ 
Jm = Дей че Әр ff Riewe С = 29x s 2 


ІШЕ (UL < m атола ата > ater | lawcesic | )) 
largest = 1; 


(пе «Є im ќе eue] > arr | laiegqasic|)) 
largest = г; 


if (largest != 1) | 
int swap = arr[i]; 
arr[i] = arr[largest]; 
arr[largest] - swap; 


heapify(arr, n, largest); // Recursively heapifying 


Figure 17 - Heap Sort Heapify Function (Appendix В)!“ 


The indices of an internal node and its children and represented by the variables largest, l, and 
г. If either child node is larger than the other child and the parent node, then largest is 
reassigned to that node, the indices of the child and parent node are swapped, and the affected 
sub-tree!? rooted at the largest node is recursively heapified until a base case is reached where 
both child nodes are lesser than the parent node. 


Гое vorc SORE (imi (мене 1) 1 
int n = arr.length; 


for Gne sb - in / 2 - ip x >- Op sc) // ЕСА ае max heap 


Ше 
Фо 
За 
4. 
S. 


18 Shivi Aggarwal, "HeapSort," GeeksforGeeks, Last Modified November 16, 2020, accessed August 18, 2020, 
https://www.geeksforgeeks.org/heap-sort/. 
19 ibid. 


In@ajouliey (esee, Im, Шуя 
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6. 

3. kor lime а= Ло 4508 == í 

8. int temp = агг [0]; // Moving current root to end 
Зо еее | = aie | 2118 

LO, arr[i] = temp; 

її 

12, heapify(arr, i, 0); // Heapifying reduced heap 

Tor ) 

14. } 


Figure 18 - Sort Function: Building and Sorting Max Heap (Appendix B)” 


The first loop performs a breadth first traversal by calling heapify() on all sub-trees rooted at 


nodes level-wise, from node і = = — 1 (where п is ће array length) to node i = 0 in order to 


construct a Max Heap, 1.е., the loop disregards the leaves of the heap. The second loop performs 
the actual Heap Sort by iteratively executing a simple swapping algorithm to swap the indices 
of the first and last nodes, and then calling heapify() on the root of the reduced heap whose 


number of nodes decrease from п — 1 to 0 throughout loop execution as the array is sorted. 


The average time complexity of Heap Sort is @(nlog(n)). Тһе heapify() function happens in 
O(log(n)) time since the number of levels to be recursed down in a heap or a sub-tree 
increases logarithmically with respect to the number of nodes. Hence, when building and 
sorting the Max Heap, n integers must be inserted and outputted respectively and re- 
heapification takes place after each. Therefore, by adding both linearithmic complexities, the 


constant can be ignored and the overall complexity becomes O(nlog(n)). 


3. Hypothesis 


It is evident that both sorting algorithms have an average time complexity of @(nlog(n)). 
However, multiple stark contrasts are present in the properties of both data structures and the 


respective algorithm designs, such as the contrast between a Binary Heap's self-balancing?! to 


20 ibid. 
21 “CS 312 Lecture 25: Priority Queues and Binary Heaps,” Lecture 25: Priority Queues and Binary Heaps, accessed August 
18, 2020, https://www.cs.cornell.edu/courses/cs312/2007sp/lectures/lec25.html. 


14 


a normal BST’s unbalanced nature or the contrast between the Heap Sort’s partially iterative 
logic to a Tree sort's purely recursive logic. As a result, while the trends in algorithm execution 
time might be similar between both algorithms, the actual execution times for sorting very large 
integer datasets could be substantially different, perhaps lower for Heap Sort due to its 


asymptotically balanced nature and space-efficient array implementation.” 


Therefore, the aim of this experiment is to test the effect of increasing randomized dataset size 
n on the time taken t by both Tree Sort and Heap Sort to sort the dataset in increasing order. 
The relationships between the two variables will be comparatively analyzed between both 
algorithms. Moreover, for deeper analysis, the range R of the datasets will be changed as an 


auxiliary independent variable to determine any additional effect on sorting performance. 


I hypothesize that the Heap Sort will sort the dataset in lower time than the Tree Sort. There 


will be a linearthimic relationship between п and t. 


4. Methodology 


4.1 Independent Variable 


The independent variable is the size of the integer datasets n. The sizes will increase from 
10000 integers to 100000 integers in increments of 10000 in order to acquire a significant 
number of data points to plot more accurate and precise graphs. Furthermore, for each size n 
three datasets will be generated with ranges ої 2 x 105, 4 x 10“, and 6 x 10? respectively. 
All integer datasets will have completely randomized distribution (discrete uniform). 
Moreover, the datasets will also contain both positive and negative integers. An online random 


number generator will be used for the same. 


22 "OpenDSA Data Structures and Algorithms Modules Collection," 13.12. Heapsort - OpenDSA Data Structures and 
Algorithms Modules Collection, accessed August 20, 2020, https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/ 
Heapsort.html. 
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4.2 Dependent Variable 


The dependent variable is the time taken tby each algorithm to sort integer datasets of 
increasing size and the three respective ranges in nanoseconds. The nanoTime() method in the 


Stopwatch class will be used in order to determine the difference in time before and after the 


sorting execution with high precision and reduced random and systematic errors. 


4.3 Controlled Variables 


Variable 


Description 


Specifications 


Computer and 


The algorithms will be run on a 


Processor: Intel Core 17- 


Distribution of 


Datasets 


uniform distribution within the 


given ranges. 


Operating System | Dell G3 3500 with Windows 10 | 107500 @ 3.0 GHz 
Home. OS: Windows 10 x64 

Memory: 8GB RAM (DDR3 - 
12800) 

Integrated The IntelliJ IDEA IDE will be Version: Community Edition 

Development used under the Apache 2 license. | 2020.2.1 

Environment JDK and JRE: Java SE 8u261 

Probability All datasets will have discrete RNG: PineTools 


Mean of Datasets 


The mean for all datasets will be 
within [-100, 100]. 


The mean will be fairly constant 
since the datasets will be 
randomly distributed on both 


sides of the mid-range. 


Data types 


Only the int (32 bit) primitive 
data type will be used to 
represent the numbers. long (64 
bit) will be used to store the 


sorting times in nanoseconds. 


Table 1 - List of Controlled Variables 
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44 Ргоседиге 


5, 


Set up both TreeSort.java апа HeapSort.java files (refer Appendix А, B) in ап IntelliJ 
project folder. 

Using the PineTools random number generator, generate 10 randomized integer datasets 
each for the ranges of | -1 х 105,1 x 105], |-2 x 105,2 x 105], and |-3 x 105,3 х 
10°] with number of integers n ranging from 10000 to 100000 (30 datasets in total). 
Transfer 30 datasets as properties (key-value pairs) in a .properties file (refer Appendix D). 
Create a new file SortLauncher.java (refer Appendix C) and using the Properties and 
FileInputStream classes, load all 30 datasets into an instance of the Properties class. 
Access the required dataset using the getProperty() method of the Properties class and 
convert it into a String array. 

Use the convertStringToIntegerArray() method to parse the String array and convert it into 
an integer array. 

Finally create instances of the HeapSort and TreeSort classes and run the sort() methods 
with the integer array (unsorted dataset) as the argument. 

Refer to the terminal to record the sorting times for both the Heap Sort and the Tree Sort. 
Re-run SortLauncher.java for all 30 datasets by changing the property being accessed in 
IntelliJ’s debug configuration. Perform 3 trials for each dataset and take average times for 


both sorting algorithms. 


Data Processing and Graphing 


5.1 Raw Data Collection 


It must noted that both the sorting algorithms chosen are reliable, efficient, and concisely 


follow the expected algorithm paradigms. All applications were closed during algorithm 


execution and startup programs were disabled to free up RAM. 
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Tree Sort (В = 2 х 105) 


Size of Integer | Time 1 /ns Time 2 /ns Time 3 /ns Average Time 

Dataset (n) (t) /ns 
10000 5180800 5552200 4887200 5206733 
20000 6906800 8257100 6790100 7318000 
30000 10384800 11157500 9796600 10446300 
40000 13978400 13087600 13268100 13444700 
50000 16832100 17881100 17600200 17437800 
60000 20705700 19277100 20636100 20206300 
70000 26992600 24162600 24371900 25175700 
80000 30919700 28979400 27821400 29240167 
90000 32581000 34838000 32718600 33379200 
100000 36683100 37547000 38884900 37705000 


Table 2 - Tree Sort Sorting Times for Dataset Range of 2 x 10° 


Tree Sort (R — 4 x 105) 


Size of Integer | Time 1 /ns Time 2 /ns Time 3 /ns Average Time 

Dataset (n) (t) /ns 
10000 5549400 5294000 5268000 5370467 
20000 8860400 6638000 6802800 7433733 
30000 10220600 10049500 10121800 10130633 
40000 12858600 13255600 13372200 13162133 
50000 18960000 16757900 16434400 17384100 
60000 22185400 20818300 20075100 21026267 
770000 24875900 24466000 29302900 26214933 
80000 30017000 28094700 28928600 29013433 
90000 34962100 30965200 33269600 33065633 
100000 40465600 35123900 37169900 37586467 


Table 3 — Tree Sort Sorting Times for Dataset Range of 4 x 10? 


Tree Sort (В - 6 x 10?) 


Size of Integer 
Dataset (n) 


Time 1 /ns 


Time 2 /ns 


Time 3 /ns 


Average Time 
(t) /ns 
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10000 5091700 5427700 5304200 5274533 
20000 8692600 6833000 7104800 7543467 
30000 10113000 10327300 9529500 9989933 
40000 13037200 13377000 13058100 13157433 
50000 17986600 17751900 17198000 17645500 
60000 20253300 22187500 21088400 21176400 
70000 26975100 24872600 24872600 25573433 
80000 28565400 27309300 29578600 28484433 
90000 32314900 31227900 32442500 31995100 
100000 35997400 37716000 36756000 36823133 


Table 4 - Tree Sort Sorting Times for Dataset Range of 6 x 10° 


Heap Sort (R = 2 x 10°) 


Size of Integer | Time 1 /ns Time 2 /ns Time 3 /ns Average Time 

Dataset (n) (t) /ns 
10000 3368400 3075900 3037200 3160500 
20000 4787400 5692900 5325300 5268533 
30000 7649300 6399900 6786700 6945300 
40000 7996800 9042800 8338200 8459267 
50000 11089600 9162200 10031800 10094533 
60000 12513000 12316200 13213400 12680867 
70000 14394300 14172500 15126500 14564433 
80000 16780200 16651600 15549900 16327233 
90000 18654000 20902600 19026000 19527533 
100000 22983400 23409800 22394000 22929067 


Table 5 — Heap Sort Sorting Times for Dataset Range of 2 x 10° 


Heap Sort (R = 4 x 10°) 


Size of Integer | Time 1 /ns Time 2 /ns Time 3 /ns Average Time 
Dataset (n) (t) /ns 
10000 4263500 3449400 2873100 3528667 
20000 4763900 5012500 5104200 4960200 
30000 6562400 6802300 7014200 6792967 
40000 8907500 9081900 8131600 8707000 
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50000 10824800 10007900 11792000 10874900 
60000 13105800 12784400 12922700 12937633 
70000 14211300 15064200 14191600 14489033 
80000 16805600 15610600 15613800 16010000 
90000 17475000 19076200 19892500 18814567 
100000 21968100 20979500 22872300 21939967 


Table 6 - Heap Sort Sorting Times for Dataset Range of 4 x 105 


Heap Sort (R = 6 х 10°) 


Size of Integer | Time 1 /ns Time 2 /ns Time 3 /ns Average Time 

Dataset (n) (t) /ns 
10000 4030400 3227700 3234700 3497600 
20000 4942000 5500100 4996900 5146333 
30000 7501900 6442900 6890100 6944967 
40000 8816200 8823800 9018200 8886067 
50000 10778800 10092000 11111700 10660833 
60000 11765900 12241500 12317200 12108200 
70000 13450000 15232200 13602900 14095033 
80000 16182800 17037500 15025700 16082000 
90000 18873500 17315100 18102500 18097033 
100000 21992000 21936400 21617100 21848500 


Table 7 - Heap Sort Sorting Times for Dataset Range ої 6 x 10° 


5.2 Graphs and Curve Fitting 


The above average times have been graphed first comparatively between the two algorithms 
for each range, and then for each algorithm individually with all three ranges. Since all trends 
followed a linearithmic pattern, only some minor transformations were required in order to 
curve fit n log? n effectively. The two primary function transformations shown throughout 
are vertical dilation by a certain factor and vertical translation upwards by a certain number 


due to systematic error. 
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Tree Sort vs Heap Sort (В = 2 x 105) 
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Graph 2 - Size of Dataset vs Sorting Time for Dataset Range of 4 x 10° 
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Tree Sort vs Heap Sort (В = 6 x 105) 
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Graph 3 - Size of Dataset vs Sorting Time for Dataset Range of 6 x 105 


Tree Sort АП 3 Ranges 
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Graph 4 — Size of Dataset vs Tree Sort Sorting Time for All 3 Dataset Ranges 
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Heap Sort All 3 Ranges 
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Graph 5 — Size of Dataset vs Heap Sort Sorting Time for All 3 Dataset Ranges 


6. Analysis 


Therefore, in every trend present in the above graphs, it is evident that a linearthimic function 
is able to effectively model the data. However, to quantify the goodness of fit we must find the 
coefficient of determination, using the Pearson correlation coefficient, which has been 


computed and shown in the table below. 


Algorithm Type В = 2 х 105 Е-4х105 Е-6х105 
Tree Sort 0.9918 0.9921 0.9939 
Heap Sort 0.9858 0.9924 0.9894 


Table 8 - Coefficients of Determination for Best Fit Curves 


This shows that there is a very strong linearithmic relationship between the input dataset 
size and the sorting time. However, due to the inconclusiveness of Pearson correlation for non- 
linear relationships, nonparametric Spearman rank-order correlation coefficients р must also 


be computed. 
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Algorithm Type R-2x10? В-4х10? R=6x10° 
Tree Sort 1.0000 1.0000 1.0000 
Heap Sort 1.0000 1.0000 1.0000 


Table 9 - Spearman Rank-Order Correlation Coefficients for Best Fit Curves 


This shows that along with goodness of fit, all the XY (size vs time) values can be perfectly 
modelled with a monotonically increasing function’, which in this case is linearithmic, hence 


proving the efficacy of our model. 


Finally, to determine the appropriacy of the linearithmic model, we must also use T-tests in 


order to find P-values for the data. 


Algorithm Type R=2x10° В-4х105 R=6x10° 
Tree Sort 1.23 x 10° 1.06 x 10° 3.71 x 1010 
Heap Sort 1.13 x 103 9.00 x 1010 3.50 x 10? 


Table 10 — P-Values of Tree and Heap Sort Data 


This shows that the data 15 highly statistically significant. Hence, assuming that the null 
hypothesis is that there is NOT a significant linearithmic relationship between m and t, this low 
P-value < 0.05 (а — significance level) indicates that there is extremely high probability that 
the alternate hypothesis is true 1.е., the presence of a strong linearithmic relationship, which 


our best fit functions clearly support. 


Contrarily, the Heap Sort evidently has lower sorting times than the Tree Sort for all dataset 
sizes and all three ranges. For example, for В — 2 x 10?, the average time taken to sort 10000 
integers by Heap Sort was 3160500 nanoseconds, around 39% lower than the 5206733 
nanoseconds sorting time for the Heap Sort. In fact, as the size of the dataset increases, the 


average and instantaneous sorting time per integer for Heap Sort changes much slower than the 


23 A.N. Bowman, M. C. Jones, and І. Gijbels, "Testing Monotonicity of Regression," Journal of Computational and Graphical 
Statistics 7, no. 4 (1998): 489-500, accessed November 4, 2020, https://doi.org/10.2307/1390678. 
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Tree Sort making the difference between the performances of both algorithms significantly 


more pronounced. To examine this, we can compute the derivatives of the best fit functions — 


Using the 
Tree Sort (R = 2 x 10°) — 


Chain Rule 
T(n) = 21nlog,n + (1.5 x 10°) p 3 - Tree Sort 


І 1 Trend Line Derivative 
> T'(n) = 21 ( + гет: 
(n) 0820 Іп(2)-п 


Heap Sort (R = 2 x 105) – 


H(n) = 12.5nlog,n+ (1.1 x 10°) Equation 4 — Heap Sort 


Trend Line Derivative 


1 
>H (n) = 12. 5(logan +) 


Tree Sort vs Heap Sort Derivatives (А = 2 х 105) 
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Graph 6 - Size of Dataset vs Tree and Heap Sort Instantaneous Sorting Times 


For example, at n = 10000, Tree sort took 521 nanoseconds per integer on average and Heap 


Sort took 316 nanoseconds per integer on average (ғ п, However, at п = 100000, 


average times of 377 nanoseconds and 229 nanoseconds per integer were taken respectively. 
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According to Graph 6, at n = 10000, Tree sort had an instantaneous sorting rate of 279 
nanoseconds per integer, 68% greater than the Heap Sort’s rate of 166 nanoseconds per integer 
(H'(1000)). However, at n = 100000, the instantaneous sorting rates were 349 and 207 
nanoseconds per integer respectively. This proves that the Heap Sort is sorting in significantly 


lower time and also scaling up at a much lower rate than the Tree Sort as depicted in the graph. 


Furthermore, it is also evident that for both Tree Sort and Heap Sort, the sorting time trends for 
the three ranges barely vary. Apart from a slightly increases and decreases in sorting times 
across ranges, we cannot conclusively say whether t is proportionally related to R. In this case, 
a Kruskal-Wallis one-way ANOVA test can be done on the data collected for the three ranges 
for both algorithms. Tree Sort had a P-value of 0.9961 and Heap Sort had a P-value of 0.9885. 
Hence, we can say that the probability of the null hypothesis being true i.e., there is no 


relationship between R and t, is very high. 


Finally, some random and systematic error is present in the data. As seen with the graphs for 
R = 2 x 10°, the y-intercept of the Tree Sort best fit function of 1.5 x 10“ is around 36% 
greater than the y-intercept of the Heap Sort best fit function of 1.1 x 109. Furthermore, points 
such as (60000, 20206300) for Tree Sort and (50000, 10094533) for Heap Sort show significant 
deviation from the best fit function. The reasons for these random and systematic errors will 


be discussed in Evaluation. 


7. Results Discussion & Evaluation 


Hence, as conclusive results have been obtained, numerous means can be used to justify the 
same. Firstly, it is apparent that the naturally self-balancing nature of the binary heap gives it 
an advantage. This is because for a similar number of inserted integers, a max-heap constructs 


a tree with the minimum number of levels required (since the comparisons between adjacent 
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nodes аге only done after the integers are inserted) while a BST is not concerned with the same 
(since the comparisons between existing nodes and a to-be-inserted integers are done before 


they are inserted). The same is illustrated below — 


Max Heap Binary Search Tree 


Figure 19 — Balanced vs Unbalanced Binary Search Tree Example 


As shown, the max heap has the minimum of 3 levels as required by 5 nodes, whereas the BST 
has 4 levels due to its unbalanced nature therefore making it take longer to carry out a depth 
first traversal. This is also supported by the fact that the worst-case complexity of an 
unbalanced Tree Sort is 0(п2)24 (showing that a BST can be constructed as a straight chain: 


having as many levels as nodes) while the Heap Sort’s is O(nlog; т). 


Another perspective that must be considered is that of Recursion vs Iteration. The Tree Sort is 
a purely recursive algorithm with both the insertion and traversal of every node being done 
recursively. The Heap Sort, contrarily, does heapification/insertion recursively but conducts 
the traversal iteratively. The heapification is also optimized since the algorithm iteratively visits 


the roots of sub-trees that need to be heapified after which recursion takes over. 


The reason recursion is slower than iteration is that, when considering depth-first traversals as 


24 Alexa Ryder, "Tree Sort Algorithm," OpenGenus IQ: Learn Computer Science (OpenGenus IQ: Learn Computer Science, 
March 18, 2018), accessed November 12, 2020, https://iq.opengenus.org/tree-sort/. 
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used by the BST, each successive recursive call to Ше insert() or dfs() (refer Appendix А) 
functions gets added as a stack frame to the top of a call stack (a linear data structure that 
follows the last in first out principle)? from which the recursive subroutines take place. This 
call stack thus necessitates the allocation of excess overhead time and memory (iteration does 


not require this) consequently explaining why the tree sort is more time-intensive. 


By the same token, it must also be realized that that the O(1) space complexity?Ó of the Heap 
Sort is also a massive advantage compared to the O(n) space complexity of the Tree Sort. The 
fact that the Heap Sort can use array indices as node pointers allows it to quickly sort the dataset 
within the array itself. Contrarily, integers in the Tree Sort must be assigned to an object along 
with two other pointers. Hence not only does a larger dataset require more time and memory 
to create more objects, the fact that an integer itself has a 12 byte overhead in an object?’ is а 


huge memory allocation time-waste for the Tree Sort. 


Finally, the minimal systematic error can obviously be attributed to the javac compile time of 
the algorithms since at п — 0, the runtime is negligible yet the y-intercept of the linearithmic 
functions is not 0. Virtual memory stored on the PC could have also contributed to compiler 
lag. The random errors could have been caused by algorithm runtime being affected by 


constantly changing CPU clock-speeds due to the varying processing consumption. 


8. Conclusion 


Therefore, with reference to my hypothesis "І hypothesize that the Heap Sort will sort the 


dataset in lower time than the Tree Sort. There will be a linearthimic relationship between n 


25 "4.3, What Is a Stack?" 4.3. What Is a Stack? - Problem Solving with Algorithms and Data Structures, accessed November 
12, 2020, https://www.runestone.academy/runestone/books/published/pythonds/BasicDS/WhatisaStack.html. 

?6 Time Complexity and Space Complexity comparison of Sorting Algorithms, Scanftree, accessed November 12, 2020, 
https://www.scanftree.com/Data_Structure/time-complexity-and-space-complexity-comparison-of-sorting-algorithms. 

27 Java Tips By Vladimir Roubtsov and Vladimir Roubtsov, “Java Tip 130: Do You Know Your Data Size?,” InfoWorld 
(JavaWorld, August 16, 2002), accessed November 12, 2020, https://www.infoworld.com/article/2077496/java-tip-130-- 
do-you-know-your-data-size-.html. 
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and £^, this experiment was able to comparatively determine the time complexities and hence 
the sorting efficiency of both algorithms and provide conclusive evidence for the fact that the 
Heap Sort always sorts in lower time than the Tree Sort and that the relationship between size 
of the randomized integer dataset (п) and time taken to sort (Е) is most closely linearithmic: 


proving my hypothesis correct. 


9. Further Scope 


Therefore, considering that the primary limitation of the BST is that it is unbalanced, the 
performance of the BST can be improved by using a self-balancing red-black tree for insertions 
in order to avoid skewed trees and consequently worst-case complexities. In addition to this, 
ys 


adaptive variants of both sorting algorithms (adaptive heap sorts and splay sorts)^ could also 


reduce running time by exploiting any partially ordered input data. 


Furthermore, the tree sort's space inefficient recursive logic can be solved using an iterative 
variant of the algorithm so that additional time required by the call stack can be avoided. 
Contrarily, utilizing a ternary instead of a binary heap could be useful since the height of the 
tree could now be decreased to logs n from log; n.” So, while the comparisons per level 


would increase, the number of levels recursed through itself would be lower. 


Ultimately, the differences between the running times for datasets of various ranges could have 
been made more significant if larger ranges of long data type integers were used. The type of 
integer distribution used such as Gaussian or Poisson distributions could also be added as 


another complex parameter. 


28 Alistair Moffat, Splaysort: Fast, Versatile, Practical, accessed November 12, 2020, https://people.eng.unimelb.edu.au/ 
ammoffat/abstracts/spe.splay.html. 

?9 Kosmopo, "set3solutions," University of Texas at Arlington, accessed November 12, 2020, http://ranger.uta.edu/ 
~kosmopo/cse5311/homework/set3solution.pdf. 
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Appendices 


Appendix A - TreeSort.java?? 


1. package com.company; 

25 

Sis еше О Class Шевесежяе | 

4. 

ON public (лесе Class Noce | 

Зо int key; // Integer value of the node 

Te Node left, right; // Pointers to left and right child nodes 
8. 

oF public Node(int item) { 

10. key = item; 

11. left = right = null; 

12, ) 

isk } 

14, 

157 Node root; 

1%. 

Б public TreeSort() 4 

iog root = null; 

19. } 

20 

2211 public Node insert (Node node, int key) ( 

22. 1Е (поде == пи11) { 

23, node - new Node(key); // Creating a new tree 
24. return node; 

Z5 ) 

26 if (key < поде.Кеу) 

27) node.left = insert(node.left, key); 

28 

29 else if (key > node.key) 

30 node.right = insert (node.right, key); 

ЭЛ 

32 return node; 

33 } 

34 

35 public void dfs(Node node) ( 

36 if (node != null) ( 

37 dfs(node.left); // Recursing down the left sub-tree 
38 int nodeValue - node.key; 

39. dfs(node.right); // Recursing down the right sub-tree 
40. ) 

41. ) 

42. 

За jowloll vorc везе (лае [| әпше) | 

44. long startTime = System.nanoTime(); // Stopwatch start 
45. 

46. бле (час J 8 ue) | 


30 үріп M, "Tree Sort," GeeksforGeeks, April 20, 2020, accessed August 1, 2020, https://www.geeksforgeeks.org/tree-sort/. 
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Ao ӘЛЕ = ingert (LOSE, 2)? 


48. ) 

49. 

50. elite (LOGIC) 2 

51. 

525 long stopTime - System.nanoTime(); // Stopwatch stop 

DON 

M System.out.println("\n\nTree Sort Start Time: " + 
startTime); 

55.5 System.out.println("\nTree Sort Stop Time: " + stopTime); 

56. System.out.println("\nTime Taken To Tree Sort: " + 
(stopTime - startTime)); 

57 о ) 

59, ) 


Appendix В - HeapSort.java*! 


1. package com.company; 

2 

3. public class HeapSort { 

4 

Sis Publie vorc везе (aba [| әпше) (| 

6 int n = arr.length; 

7 

8 long startTime = System.nanoTime(); // Stopwatch start 

oF 

105 for(int 1 =n / 2 - T; i >= 0; 1--) // Building max heap 

11. авиони (вие, Ty, 30g 

12, 

isk forne al = i es 19 (5 5 OF abes) || 

La, int temp = arr[0]; // Moving current root to end 

15) атк О - archi]; 

190 агг [і] = temp; 

177. 

19. heapify(arr, i, 0); // Heapifying reduced heap 

19, ) 

205 long stopTime = System.nanoTime(); // Stopwatch stop 

21 

226 System.out.println("NnHeap Sort Start Time: " + startTime) ; 

230 System.out.println("NnHeap Sort Stop Time: " + stopTime) ; 

24. System.out.println("\nTime Taken To Heap sort: " + 
(stopTime - startTime)); 

Zoe 

2 c System.out.println("NnHeap Sorted Array Is: "); 

2l. printArray (arr); 

28. ) 

29 

340) з vöid lee 125 (slime ||] arr, ime my aime 4) 4 


31 Shivi Aggarwal, “HeapSort,” GeeksforGeeks, Last Modified November 16, 2020, accessed August 18, 2020, 
https://www.geeksforgeeks.org/heap-sort/. 
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ЕЛІ int largest = i; // Initializing largest as root 
32 s aba dL = дәл se la И шенге Са 6] = 21 sp il 
33} 6 айр i = Дш че Be ДИ лб (Саз = 271 F 2 
34. 

Sor ie (lL << а ќи. ewe] > мете lesse] )) 

267 largest = 1; 

97 s 

Зо Wie (це < m && ewe] > Aer | араја | )) 

3%, largest = г; 

40. 

41. if(largest !- і) 4 

42. int swap = агг[1]; 

43 arr[i] - arr[largest]; 

44, arr[largest] = swap; 

45. 

46. heapify(arr, n, largest); // Recursively heapifying 
47. } 

48. } 

49, 

507 ТРИО е vord printana (аме | ezz) 4 

51s cor (час J 5 esu) (| 

525 иса out. printa ar "Un Wn 

53.5 ) 

54. © Б еШ Gurie о О melia) 2 


Appendix С - SortLauncher.java 


1. package com.company; 

25 

3. import java.io.IOException; 

4. import java.io.FileInputStream; 

S. ООСО java.util.Properties;? 

Ge 

7/5 ошо Class шешеге | // Рвссваш to ible Dora сохае аео ие 

Be 

9) s public static int[] convertStringToIntegerArray(String[] string) 
( 

10). іп [] arr = new int[string.length]; 

1165 

125 toe (оно s. = Of a < guexsbeg Шошо ii) 

Tor arr[i] = Integer.parseInt (string[i]); 

14. 

15. return arr; 

165 } 

17 

18. РОТЕ gracie vorc жазыш еее || args) carows (Ohren 1 


32 “Properties Class (Java Platform SE 8),” Oracle Java Documentation, accessed October 24, 2020, https://docs.oracle.com/ 
javase/8/docs/api/java/util/Properties.html. 
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19, 


20. 
2100 
220 
290 


24. 
2. 


260 


275 
209 
29. 
SO 
Silks 
эЛ 
33} 5 


FileInputStream fis = new 
FileInputStream("src/com/company/numbers.properties"); 

Properties prop - new Properties(); 

prop.load(fis); // Loading number.properties file 


Эе s лапта largs Ol s» " о exse "Uoc 
prop.getProperty(args[0])); // This argument is the property's key 


String[] inputNumber - prop.getProperty(args[0]).split(", 
“у 

int[] unsortedArray = 
convertStringToIntegerArray (inputNumber) ; 


HeapSort heap = new HeapSort(); 
heap.sort (unsortedArray); // Inputting preferred dataset 


unsortedArray = convertStringToIntegerArray (inputNumber) ; 
па сесо tree = new Тийе Оше (0) 9 


tree.sort(unsortedArray); // Inputting preferred dataset 


Appendix D - number.properties (screenshot) 
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